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ABSTRACT 


Optical Character Recognition is a process of classifying optical patterns with 
respect to alphanumeric or other characters. It also includes segmentation, 


feature extraction and classification. 


Deep learning is part of a broader family of machine learning methods based 


on artificial neural networks with. representation learning 


The idea of the project is to extract text from image using Deep Learning by 


OCR 
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1. INTRODUCTION 

OCR, or optical character recognition, is one of the earliest 
addressed computer vision tasks, since in some aspects it 
does not require deep learning. Therefore there were 
different OCR implementations even before the deep 
learning boom in 2012. 


This makes many people think the OCR challenge is “solved”, 
itis no longer challenging. Another belief which comes from 
similar sources is that OCR does not require deep learning, 
or in other words, using deep learning for OCR is an overkill. 


Z. Existing system 

In the running world there is growing demand for the users 
to convert the printed documents into electronic document 
for maintaining the security of their data. 


Hence the basic OCR system invented to convert the data 
available on papers into computer process-able documents. 


So the documents can be editable and reusable. Drawback-In 
early OCR systems is that they only have capability to 
convert & recognize only the documents of English or 
specific. 
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3. Motivation And Scope 

Optical Character Recognition is needed when the 
information should be readable both to humans and to a 
machine. 


The scope of this project is to provide an efficient and 
enhanced software for the users to perform Document Image 
Analysis, document processing by reading and recognizing 
the characters in research, academic, governmental and 
business organizations that are having large pool of 
document, scanned images. 


4. SYSTEM ARCHITECTURE 

components of the system consist of: Preprocessing, Feature 
extraction, Preprocessing: This sub-system performs noise 
removal, deploring, filtering and linearization on the input 
image. Next samples out characters from preprocessed 
ancient documents. Feature Extraction: This component 
extracts features from the input image and stores the 
extracted features in a feature vector. 
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5. ARCHITECTURE OF OCR 
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6. LIST OF MODULES 
The recognition system has two main modules: 
Text detection based on Connectionist Text Proposal Network 


Text recognition based on Attention-based Encoder-Decoder. 
Text detection based on Connectionist Text Proposal Network 


Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image. The CTPN detects a text line 
in a sequence of fine-scale text proposals directly in convolution feature maps. 


The CTPN works reliably on multi-scale and multi-language text without further post-processing, departing from previous 
bottom-up methods requiring multi-step post filtering 


Text recognition based on Attention-based Encoder-Decoder 


Accurate and rich semantic information carried by the text is important for many application scenarios such as image 
searching, intelligent inspection, product recognition and autonomous driving. For these reasons, scene text recognition has 
been an active research field in computer vision 


Although optical character recognition in scanned documents has been considered as a solved problem 


7. ALGORITHM 

Convolution Recurrent Neural Networks 

> Convolution Neural Networks (CNN). 

> Recurrent Neural Networks (RNN). 

> Long Short Term Memory Networks (LSTMs). 
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8. RESULT 
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